Using Confidence Bounds for Exploitation-Exploration Trade-offs

نویسنده

Peter Auer

چکیده

We show how a standard tool from statistics — namely confidence bounds — can be used to elegantly deal with situations which exhibit an exploitation-exploration trade-off. Our technique for designing and analyzing algorithms for such situations is general and can be applied when an algorithm has to make exploitation-versus-exploration decisions based on uncertain information provided by a random process. We apply our technique to two models with such an exploitation-exploration trade-off. For the adversarial bandit problem with shifting our new algorithm suffers only Õ ( (ST ) ) regret with high probability over T trials with S shifts. Such a regret bound was previously known only in expectation. The second model we consider is associative reinforcement learning with linear value functions. For this model our technique improves the regret from Õ ( T 3/4 ) to Õ ( T 1/2 ) .

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning and innovation: Exploitation and exploration trade-offs☆

a r t i c l e i n f o This paper examines the relationship between learning and innovation outcomes, focusing on the trade-off between exploitation and exploration in learning and innovation. The study identifies two types of learning and two outcomes of innovation. Exploitation and exploration in learning are inversely associated with innovation rates and impact. While exploitative, localized ...

متن کامل

Balance Within and Across Domains: The Performance Implications of Exploration and Exploitation in Alliances

Organizational research advocates that firms balance exploration and exploitation, yet it acknowledges inherent challenges in reconciling these opposing activities. To overcome these challenges, such research suggests that firms establish organizational separation between exploring and exploiting units or engage in temporal separation whereby they oscillate between exploration and exploitation ...

متن کامل

Ethical Perspective: Five Unacceptable Trade-offs on the Path to Universal Health Coverage

This article discusses what ethicists have called “unacceptable trade-offs” in health policy choices related to universal health coverage (UHC). Since the fiscal space is constrained, trade-offs need to be made. But some trade-offs are unacceptable on the path to universal coverage. Unacceptable choices include, among other examples from low-income countries, to expand coverage for services wit...

متن کامل

On multilabel classification and ranking with bandit feedback

We present a novel multilabel/ranking algorithm working in partial information settings. The algorithm is based on 2nd-order descent methods, and relies on upper-confidence bounds to trade-off exploration and exploitation. We analyze this algorithm in a partial adversarial setting, where covariates can be adversarial, but multilabel probabilities are ruled by (generalized) linear models. We sho...

متن کامل